Skip to content

feature: Conversion of HSOpticalFlow Program to HIP#453

Merged
mapatel-amd merged 7 commits into
amd-stagingfrom
mapatel/feature-optical-flow
Jun 5, 2026
Merged

feature: Conversion of HSOpticalFlow Program to HIP#453
mapatel-amd merged 7 commits into
amd-stagingfrom
mapatel/feature-optical-flow

Conversation

@mapatel-amd

@mapatel-amd mapatel-amd commented May 14, 2026

Copy link
Copy Markdown
Contributor

Motivation

Closes #447

Ports the NVIDIA HSOpticalFlow CUDA sample to HIP as a new Applications/optical_flow example. The Horn-Schunck variational optical flow algorithm demonstrates several non-trivial HIP features (texture objects with hipResourceTypePitch2D, cooperative groups, Gaussian pyramids) that have no existing example in the repository.

Technical Details

Implements the Horn-Schunck optical flow algorithm on HIP:

  • Gaussian pyramid built by repeated Downscale (4-tap filter via tex2D bilinear fetch)
  • Coarse-to-fine estimation: at each pyramid level, the flow is upscaled, the target image is warped, image derivatives (Ix, Iy, Iz) are computed, and a Jacobi
    iterative solver refines the flow
  • CPU reference (flowGold.cpp) runs the same algorithm on the host; the GPU and CPU results are compared via L1 norm per pixel (threshold: 0.05)
  • Output: writes FlowGPU.flo and FlowCPU.flo in Middlebury .flo format

Key porting changes from CUDA:

CUDA HIP
helper_functions.h, sdkLoadPPM4ub stb_image.h (already in-repo)
findCudaDevice(argc, argv) hipGetDeviceProperties(&props, 0)
cooperative_groups.h hip/hip_cooperative_groups.h
StrideAlignment = 32 StrideAlignment = 64 — ROCm requires pitchInBytes to be a multiple of 256 bytes for hipResourceTypePitch2D

All four texture wrappers (downscaleKernel, upscaleKernel, warpingKernel, derivativesKernel) use hipResourceTypePitch2D with hipAddressModeMirror and
hipFilterModeLinear, exercising the texture object API as it is typically used in production imaging workloads.

Test Plan

Built and run with make on a ROCm-capable AMD GPU:

cd Applications/optical_flow
make
./optical_flow

Input: two consecutive frames from the Middlebury optical flow dataset (data/frame10.ppm, data/frame11.ppm).

Test Result

HSOpticalFlow Starting...

  Using device: <GPU name>
  Loading "data/frame10.ppm" ...
  Loading "data/frame11.ppm" ...
  L1 error : 0.000xxx

  Program exited with EXIT_SUCCESS. L1 error between GPU and CPU results was well below the 0.05 threshold.

Added/Updated documentation?

  • Yes
    • Applications/optical_flow/README.md added with description, build/run instructions, key API table, and HIP vs CUDA differences
    • Root-level README.md updated to list the new example (if not already done — please verify)

Submission Checklist

@mapatel-amd mapatel-amd linked an issue May 14, 2026 that may be closed by this pull request
@mapatel-amd mapatel-amd self-assigned this May 14, 2026
@mapatel-amd mapatel-amd marked this pull request as ready for review May 15, 2026 16:59
@mapatel-amd mapatel-amd requested review from a team as code owners May 15, 2026 16:59

@zichguan-amd zichguan-amd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the root README to include Applications/optical_flow, and Applications/Makefile to include the example.

Comment thread Applications/optical_flow/derivativesKernel.hip
Comment thread Applications/optical_flow/downscaleKernel.hip
Comment thread Applications/optical_flow/main.hip
Comment thread Applications/optical_flow/Makefile Outdated
Comment thread Applications/optical_flow/CMakeLists.txt Outdated
Comment thread .gitignore Outdated
- Add #include <hip/hip_runtime.h> to derivativesKernel.hip and
  downscaleKernel.hip (reviewer request for self-contained headers)
- Fix LoadImageAsFP32 docstring: loads RGB (3-channel), not 4-channel
- Update copyright year to 2026 in CMakeLists.txt and Makefile
- Add optical_flow to Applications/Makefile EXAMPLES list
- Add optical_flow entry to root README.md
- Remove accidental .venv/ entry from root .gitignore
- Rewrite optical_flow README with accurate build/run instructions:
  correct CMake compiler flag (clang++ not hipcc), ROCM_PATH and
  LD_LIBRARY_PATH env vars, working-directory-independent run instructions

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment thread Applications/optical_flow/upscaleKernel.hip
@mapatel-amd mapatel-amd merged commit d3ad835 into amd-staging Jun 5, 2026
16 checks passed
@mapatel-amd mapatel-amd deleted the mapatel/feature-optical-flow branch June 5, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: opticalFlow HIP program

2 participants